Data Visualization with R

Tony Yao-Jen Kuo

Overview

Sli.do

https://www.sli.do/

Event code: 5421

Agenda

  • gapminder
  • dplyr
  • ggplot2
  • plotly
  • shiny
  • Case study

Introducing the data visualization eco-system in R

  • magrittr for chaining functions(included in dplyr)
  • gapminder for a great plotting data
  • dplyr for data manipulation
  • ggplot2 for data visualization
  • plotly and shiny for interactive data visualization

We are gonna talk about them briefly

First of all, updating R to LTS version.

https://cran.r-project.org/

Installation and library these packages

gapminder

The story of Hans Rosling and Gapminder

https://youtu.be/jbkSRLYSojo

dplyr

Basic functions in dplyr

  • filter()
  • select()
  • arrange()
  • mutate()
  • summarise()
  • group_by()

filter() for subsetting rows

## # A tibble: 12 x 6
##    country continent  year lifeExp      pop gdpPercap
##    <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Taiwan  Asia       1952    58.5  8550362     1207.
##  2 Taiwan  Asia       1957    62.4 10164215     1508.
##  3 Taiwan  Asia       1962    65.2 11918938     1823.
##  4 Taiwan  Asia       1967    67.5 13648692     2644.
##  5 Taiwan  Asia       1972    69.4 15226039     4063.
##  6 Taiwan  Asia       1977    70.6 16785196     5597.
##  7 Taiwan  Asia       1982    72.2 18501390     7426.
##  8 Taiwan  Asia       1987    73.4 19757799    11055.
##  9 Taiwan  Asia       1992    74.3 20686918    15216.
## 10 Taiwan  Asia       1997    75.2 21628605    20207.
## 11 Taiwan  Asia       2002    77.0 22454239    23235.
## 12 Taiwan  Asia       2007    78.4 23174294    28718.

select() for extracting columns

## # A tibble: 12 x 3
##     year gdpPercap lifeExp
##    <int>     <dbl>   <dbl>
##  1  1952     1207.    58.5
##  2  1957     1508.    62.4
##  3  1962     1823.    65.2
##  4  1967     2644.    67.5
##  5  1972     4063.    69.4
##  6  1977     5597.    70.6
##  7  1982     7426.    72.2
##  8  1987    11055.    73.4
##  9  1992    15216.    74.3
## 10  1997    20207.    75.2
## 11  2002    23235.    77.0
## 12  2007    28718.    78.4

arrange() for sorting rows based on certain variables

## # A tibble: 33 x 6
##    country          continent  year lifeExp        pop gdpPercap
##    <fct>            <fct>     <int>   <dbl>      <int>     <dbl>
##  1 Myanmar          Asia       2007    62.1   47761980      944 
##  2 Afghanistan      Asia       2007    43.8   31889923      975.
##  3 Nepal            Asia       2007    63.8   28901790     1091.
##  4 Bangladesh       Asia       2007    64.1  150448339     1391.
##  5 Korea, Dem. Rep. Asia       2007    67.3   23301725     1593.
##  6 Cambodia         Asia       2007    59.7   14131858     1714.
##  7 Yemen, Rep.      Asia       2007    62.7   22211743     2281.
##  8 Vietnam          Asia       2007    74.2   85262356     2442.
##  9 India            Asia       2007    64.7 1110396331     2452.
## 10 Pakistan         Asia       2007    65.5  169270617     2606.
## # ... with 23 more rows

mutate() for creating new columns

## # A tibble: 12 x 7
##    country continent  year lifeExp      pop gdpPercap gdp_million
##    <fct>   <fct>     <int>   <dbl>    <int>     <dbl>       <dbl>
##  1 Taiwan  Asia       1952    58.5  8550362     1207.      10320.
##  2 Taiwan  Asia       1957    62.4 10164215     1508.      15326.
##  3 Taiwan  Asia       1962    65.2 11918938     1823.      21727.
##  4 Taiwan  Asia       1967    67.5 13648692     2644.      36085.
##  5 Taiwan  Asia       1972    69.4 15226039     4063.      61856.
##  6 Taiwan  Asia       1977    70.6 16785196     5597.      93939.
##  7 Taiwan  Asia       1982    72.2 18501390     7426.     137398.
##  8 Taiwan  Asia       1987    73.4 19757799    11055.     218414.
##  9 Taiwan  Asia       1992    74.3 20686918    15216.     314765.
## 10 Taiwan  Asia       1997    75.2 21628605    20207.     437045.
## 11 Taiwan  Asia       2002    77.0 22454239    23235.     521734.
## 12 Taiwan  Asia       2007    78.4 23174294    28718.     665526.

summarise() for a summary

## # A tibble: 1 x 1
##   `median(gdpPercap)`
##                 <dbl>
## 1               3532.

group_by() for a grouped summary

## # A tibble: 5 x 2
##   continent medianGdpPercap
##   <fct>               <dbl>
## 1 Africa              1192.
## 2 Americas            5466.
## 3 Asia                2647.
## 4 Europe             12082.
## 5 Oceania            17983.

Going further with dplyr

https://dplyr.tidyverse.org/

ggplot2

gg stands for…

Grammar of graphics.

Basic concepts

  • ggplot(aes(x = , y = , color = , fill = , ...)) for data mapping
  • geom_OOO() for different charts`
  • Using + to add different layers

geom_point() for exploring correlations

Rendering scatter plot

geom_histogram() for exploring distributions

Rendering histogram

geom_bar() for exploring row counts

Rendering bar plot

geom_bar() for grouped summary

Rendering another bar plot

Going further with ggplot2

https://ggplot2.tidyverse.org/

plotly

About plotly

Create interactive, D3 and WebGL charts in R.

Quickstart with plotly

Converting ggplot2 graphs to interactive versions with ggplotly().

Converting our last bar plot

Rendering interactive bar plot

Plotting a gapminder replica with plotly

Rendering gapminder replica with plotly

shiny

About shiny

Shiny is an R package that makes it easy to build interactive web applications (apps) straight from R.

What is a Shiny app

  • Shiny apps are contained in a single script called app.R
  • app.R lives in a directory (for example, newdir/) and the app can be run with runApp(“newdir”)

app.R has three components

  • a user interface object
  • a server function
  • a call to the shinyApp function

How do these three components collaborate

  • The user interface (ui) object controls the layout and appearance
  • The server function contains the instructions that your computer needs to build your app
  • The shinyApp() function creates Shiny app objects

A hello shiny app

Creating a gapminder replica with shiny and plotly

Case study

Getting our case study data

## [1] 62689     7
##   admin_area district village office votes  party candidate
## 1     台北市   北投區  建民里      1     4 無黨籍    吳蕚洋
## 2     台北市   北投區  建民里      2     2 無黨籍    吳蕚洋
## 3     台北市   北投區  建民里      3     2 無黨籍    吳蕚洋
## 4     台北市   北投區  文林里      4     1 無黨籍    吳蕚洋
## 5     台北市   北投區  文林里      5     5 無黨籍    吳蕚洋
## 6     台北市   北投區  文林里      6     3 無黨籍    吳蕚洋
##       admin_area district                        village office votes
## 62684     連江縣   南竿鄉 馬祖村、津沙村、四維村、仁愛村      4   654
## 62685     連江縣   北竿鄉         后沃村、橋仔村、塘岐村      5   838
## 62686     連江縣   北竿鄉         坂里村、白沙村、芹壁村      6   301
## 62687     連江縣   莒光鄉         田沃村、西坵村、青帆村      7   341
## 62688     連江縣   莒光鄉                 大坪村、福正村      8   391
## 62689     連江縣   東引鄉                 樂華村、中柳村      9   396
##            party candidate
## 62684 中國國民黨    劉增應
## 62685 中國國民黨    劉增應
## 62686 中國國民黨    劉增應
## 62687 中國國民黨    劉增應
## 62688 中國國民黨    劉增應
## 62689 中國國民黨    劉增應
##   admin_area          district           village              office      
##  Length:62689       Length:62689       Length:62689       Min.   :   1.0  
##  Class :character   Class :character   Class :character   1st Qu.: 190.0  
##  Mode  :character   Mode  :character   Mode  :character   Median : 457.0  
##                                                           Mean   : 602.8  
##                                                           3rd Qu.: 934.0  
##                                                           Max.   :2446.0  
##      votes           party            candidate        
##  Min.   :   0.0   Length:62689       Length:62689      
##  1st Qu.:  12.0   Class :character   Class :character  
##  Median : 148.0   Mode  :character   Mode  :character  
##  Mean   : 199.5                                        
##  3rd Qu.: 358.0                                        
##  Max.   :1237.0                                        
## 'data.frame':    62689 obs. of  7 variables:
##  $ admin_area: chr  "台北市" "台北市" "台北市" "台北市" ...
##  $ district  : chr  "北投區" "北投區" "北投區" "北投區" ...
##  $ village   : chr  "建民里" "建民里" "建民里" "文林里" ...
##  $ office    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ votes     : int  4 2 2 1 5 3 0 4 5 3 ...
##  $ party     : chr  "無黨籍" "無黨籍" "無黨籍" "無黨籍" ...
##  $ candidate : chr  "吳蕚洋" "吳蕚洋" "吳蕚洋" "吳蕚洋" ...

Try creating a visualization by yourself!